
Figure 1. The conceptual . parsing architecture. 

Input buffer: the data structure that contains the character string to be parsed. 
We assume the characters are encoded by UNICODE. 



FIG. 1 



concept StreetSingle^. 



lie concept in the knowledge be 



ADDRESS_DATA 

( AUSTRALIA UNITED STATES BRITAIN CANADA NEW_ZEALAND ) 



StreetLevelObjects 



slot streetNumber { :TYPE NumericLocater :OPTIONAL 1 } 
slotstreetName { :TYPE name* } 
slotstreetType { •.TYPEStreetClassifier} 
slot orientation { :Type OnentationClassifier optional 1 } 




grammatical mapping 



:phrase <NumericLocater, m 



e*,StreetClassifier,OnentationClassifier?> 



this.bind(streetNumber,this.pattem.phrase[0]), 
this.bind(5treetName,this.pattem.phrase[1J), 
this.bind(stre€tType,this.pattem.phrase[2J), 
this.bind(orientation, this.pattem.phrase[3]) 
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Compiled knowledge base is 
maintained in an object store 
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parser asks KB explorer to propose 
lexicogrammatica! structures associated with 
a given token 



KB explorer locates the KB partition that is 
specific to the language and data type of the 
text being parsed 

I 

KB explorer searches the lexicon of the KB 
partition for an entry whose orthographic form 
matches that of the token 
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parser asks KB explorer to propose 
lexicogrammatical structures associated with 
a given token 



KB navigation service locates the KB partition 
that is specific to the language and data type 
of the text being parsed 



KB navigation service searches the lexicon of 

the KB partition for an entry whose 
orthographic form matches that of the token 




Yes 



1 



KB navigation service searches the lexical 
usage dictionary of the KB partition for all the 
usages of that lexical entry, the usages are 
returned to the parser. 
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1/^ 




invoke the information structure service 
create information structure from the matched 
lexicogrammatical pattern 



FIG. 9 



KB explorer searches the knowledge instances 
including the semantic concepts and grammatical 
structures for information structures associated with a 
given lexicogrammatical pattern 



parser maintains the parser search space by building 

links between tokens and the matched 
lexicogrammatical pattern, as well as links between 
the lexicogrammatical pattern and the created 
information structures 



refine the existing information structures in the search 
space by applying refinement operators on the newly 
created information structures 
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vl S, 22 Fontenay road, 



existing information structure 



SubdwellingAddress 



refined information structure 



FIG. 1 1 



!<2/£?3 



Example addres. Fontenoyroad and Cunon street, . 



existing Information structure 




Street) ntersection 
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Example addres 22 Fontenoy road, Hyde, NSW2113 



existing information structure 



refined information st 



ResidentialAddress 



new information st 



AustralianAddress 
state I NSW 



areacode 2113 
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If/ *3 



existing information structure 

refined information structure 
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Example addres: Dept. of computer science, school of engineering, Univ. of Sydney, .. 



existing information structure 



InstitutionalAddress 



InstitutionalAddress 



InstitutionalComplex 



name engineer 



w information structure 



InstitutionalComplex 



name engineer 



I Sydney 



name Sydney 
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14 LANE COVE ROA 



__i 4 A. ,. t. .. 

«UniiClassifier><Num><Suffix><SegmentMafker> 







| addr002 | 








] subdwlType 


"unit" 


| addr003 | 


locator 




SuffixedNumber 






number 


14 




suffix 


"A" 



KnowledgeSource. UnitClass 



Status: activated 
NextAvailableConstraint : 







KnowledgeSource: UnitTypePattem 




Effects: 






Status: matched 


NextAvailableConstraint : 









<Num><RangeMariser><Num> 



| add/004 | 


NumericRatige 


lowerBound 


12 


highBound 


14 



KnowledgeSource: NumericRande 



NextAvailableConstraint .nil 



PSS legends: 
token | <lex1> 



information structure partitic 



KnowledgeSource: 



NextAvailableConstraint . 
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lowerBound | 12 
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AbsiractAddress 



Token 


lexical type 










a 


(affix) 

(segment-marker) 


12 


(numeric) 




(range-marker) 


14 


(numeric) 


Lane 


(thoroughFareClassifier) | (properName) 


Cove 


(thoroughFareClassifier) j (properName) 




(IhoroughFareClassifier) | (properName) 




(segment-marker) 


Ryde 


(properName) 




(segment-marker) 


NSW 


(stateName) 


2113 


(AreaCode) 



Figure 19.1 Initial state of parsing. 



FIG. 19.1 




Token lexical type 

unit (subdwellinigClassiSer) 
14 (numeric) 

► a (affix) 
(segment-maker) 
12 (numeric) 

(range-marker) 
14 (numeric) 

Lane (thoroushFareClasafier) | (properName) 
Cove (thoroughFareClasaSer) | (properName) 
road (thorougtiFareGassifier) | (properName) 



2113 (AreaCode) 



(CladoWisuiiifiadwtharieJostingobject adjWQl 



Figure 19.2. Address objects built after parsing "unit 14A". 



FIG. 19.2 



SubdwellingAAiress 



Token lexical type 

unit (subdwellinigClassifSer) 



(segment-mater) 
(nymencj 
(range-marker) 
(numeric) 

(lhoroug(iFareClassfa)| (properName) 

(Ihoroughf areClassifier) | (properName) 

(thorougtiFareClassifer) | (properName) 

(segment-marker) 

(properName) 

(segment-marker) 

(slateName) 

(AreaCode) 



Figure 193. Temporary information structure held in stack. 
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ResidentialAddress 



(range-markerl 
(numeric) 

(taoushFareClassrfierl | (properName) 
(thorougtiFareClassifier) | (properName) 
(ttimoughFareClassifief) | (propwName) 



[properName) 



(a) a ThoroughFare ob|ect is c 



d a ResidentialAottess object addrX is 



(b) addtX is unified with the existing addrOOl ciaed. elaborating the structure of the iatt 



Figure 19.4. information structure obtained after parsing "12-14 Lane Cove 
Road". 



FIG. 19.4 



Token lexical type 




Figure 19.5. The final address information structure. 



FIG. 19.5 



